60 research outputs found
Deep Predictive Policy Training using Reinforcement Learning
Skilled robot task learning is best implemented by predictive action policies
due to the inherent latency of sensorimotor processes. However, training such
predictive policies is challenging as it involves finding a trajectory of motor
activations for the full duration of the action. We propose a data-efficient
deep predictive policy training (DPPT) framework with a deep neural network
policy architecture which maps an image observation to a sequence of motor
activations. The architecture consists of three sub-networks referred to as the
perception, policy and behavior super-layers. The perception and behavior
super-layers force an abstraction of visual and motor data trained with
synthetic and simulated training samples, respectively. The policy super-layer
is a small sub-network with fewer parameters that maps data in-between the
abstracted manifolds. It is trained for each task using methods for policy
search reinforcement learning. We demonstrate the suitability of the proposed
architecture and learning framework by training predictive policies for skilled
object grasping and ball throwing on a PR2 robot. The effectiveness of the
method is illustrated by the fact that these tasks are trained using only about
180 real robot attempts with qualitative terminal rewards.Comment: This work is submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 2017 (IROS2017
On the Lipschitz Constant of Deep Networks and Double Descent
Existing bounds on the generalization error of deep networks assume some form
of smooth or bounded dependence on the input variable, falling short of
investigating the mechanisms controlling such factors in practice. In this
work, we present an extensive experimental study of the empirical Lipschitz
constant of deep networks undergoing double descent, and highlight
non-monotonic trends strongly correlating with the test error. Building a
connection between parameter-space and input-space gradients for SGD around a
critical point, we isolate two important factors -- namely loss landscape
curvature and distance of parameters from initialization -- respectively
controlling optimization dynamics around a critical point and bounding model
function complexity, even beyond the training data. Our study presents novels
insights on implicit regularization via overparameterization, and effective
model complexity for networks trained in practice
A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers
Handovers are basic yet sophisticated motor tasks performed seamlessly by
humans. They are among the most common activities in our daily lives and social
environments. This makes mastering the art of handovers critical for a social
and collaborative robot. In this work, we present an experimental study that
involved human-human handovers by 13 pairs, i.e., 26 participants. We record
and explore multiple features of handovers amongst humans aimed at inspiring
handovers amongst humans and robots. With this work, we further create and
publish a novel data set of 8672 handovers, bringing together human motion and
the forces involved. We further analyze the effect of object weight and the
role of visual sensory input in human-human handovers, as well as possible
design implications for robots. As a proof of concept, the data set was used
for creating a human-inspired data-driven strategy for robotic grip release in
handovers, which was demonstrated to result in better robot to human handovers.Comment: The data set of human-human handovers can be found at:
https://github.com/paragkhanna1/datase
Data-driven Grip Force Variation in Robot-Human Handovers
Handovers frequently occur in our social environments, making it imperative
for a collaborative robotic system to master the skill of handover. In this
work, we aim to investigate the relationship between the grip force variation
for a human giver and the sensed interaction force-torque in human-human
handovers, utilizing a data-driven approach. A Long-Short Term Memory (LSTM)
network was trained to use the interaction force-torque in a handover to
predict the human grip force variation in advance. Further, we propose to
utilize the trained network to cause human-like grip force variation for a
robotic giver.Comment: Contributed to "Advances in Close Proximity Human-Robot
Collaboration" Workshop in 2022 IEEE-RAS International Conference on Humanoid
Robots (Humanoids 2022
Deep Double Descent via Smooth Interpolation
The ability of overparameterized deep networks to interpolate noisy data,
while at the same time showing good generalization performance, has been
recently characterized in terms of the double descent curve for the test error.
Common intuition from polynomial regression suggests that overparameterized
networks are able to sharply interpolate noisy data, without considerably
deviating from the ground-truth signal, thus preserving generalization ability.
At present, a precise characterization of the relationship between
interpolation and generalization for deep networks is missing. In this work, we
quantify sharpness of fit of the training data interpolated by neural network
functions, by studying the loss landscape w.r.t. to the input variable locally
to each training point, over volumes around cleanly- and noisily-labelled
training samples, as we systematically increase the number of model parameters
and training epochs. Our findings show that loss sharpness in the input space
follows both model- and epoch-wise double descent, with worse peaks observed
around noisy labels. While small interpolating models sharply fit both clean
and noisy data, large interpolating models express a smooth loss landscape,
where noisy targets are predicted over large volumes around training data
points, in contrast to existing intuition
Hyperplane Arrangements of Trained ConvNets Are Biased
We investigate the geometric properties of the functions learned by trained
ConvNets in the preactivation space of their convolutional layers, by
performing an empirical study of hyperplane arrangements induced by a
convolutional layer. We introduce statistics over the weights of a trained
network to study local arrangements and relate them to the training dynamics.
We observe that trained ConvNets show a significant statistical bias towards
regular hyperplane configurations. Furthermore, we find that layers showing
biased configurations are critical to validation performance for the
architectures considered, trained on CIFAR10, CIFAR100 and ImageNet
FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
We introduce a novel approach for monocular novel view synthesis of dynamic
scenes. Existing techniques already show impressive rendering quality but tend
to focus on optimization within a single scene without leveraging prior
knowledge. This limitation has been primarily attributed to the lack of
datasets of dynamic scenes available for training and the diversity of scene
dynamics. Our method FlowIBR circumvents these issues by integrating a neural
image-based rendering method, pre-trained on a large corpus of widely available
static scenes, with a per-scene optimized scene flow field. Utilizing this flow
field, we bend the camera rays to counteract the scene dynamics, thereby
presenting the dynamic scene as if it were static to the rendering network. The
proposed method reduces per-scene optimization time by an order of magnitude,
achieving comparable results to existing methods - all on a single
consumer-grade GPU
TD-GEM: Text-Driven Garment Editing Mapper
Language-based fashion image editing allows users to try out variations of
desired garments through provided text prompts. Inspired by research on
manipulating latent representations in StyleCLIP and HairCLIP, we focus on
these latent spaces for editing fashion items of full-body human datasets.
Currently, there is a gap in handling fashion image editing due to the
complexity of garment shapes and textures and the diversity of human poses. In
this paper, we propose an editing optimizer scheme method called Text-Driven
Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled
way. To this end, we initially obtain a latent representation of an image
through generative adversarial network inversions such as Encoder for Editing
(e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An
optimization-based Contrasive Language-Image Pre-training (CLIP) is then
utilized to guide the latent representation of a fashion image in the direction
of a target attribute expressed in terms of a text prompt. Our TD-GEM
manipulates the image accurately according to the target attribute, while other
parts of the image are kept untouched. In the experiments, we evaluate TD-GEM
on two different attributes (i.e., "color" and "sleeve length"), which
effectively generates realistic images compared to the recent manipulation
schemes.Comment: The first two authors contributed equall
- …